146        Bioinformatics

resulted in a nonsynonymous codon that is translated into an amino acid with ­different

physicochemical properties. For instance, if a hydrophobic amino acid is replaced by

another hydrophobic amino acid, SIFT will predict that change is tolerated; however, if it is

substituted with a polar amino acid, the variant will be predicted as deleterious. SIFT algo-

rithm avails of the NCBI PSI-BLAST as it uses the translated protein as a query sequence

against a database of protein sequences. The search hit sequences are aligned using mul-

tiple sequence alignment (MSA) and the probabilities of all possible substitutions at each

position are computed forming position-specific scoring matrix (PSSM), where each entry

in the matrix represents the probability of observing an amino acid in that column of the

alignment. The probabilities are normalized based on the consensus amino acids. Then,

position with normalized probability ranges between 0 and 1. SIFT predicts that a SNV

with a probability between 0.0 and 0.05 on that position is deleterious and will affect the

function of the protein and a probability greater than 0.05 (>0.05) can be tolerated. SIFT

also measures conservation of the sequence using the median sequence conservation,

which ranges from 0 to log2(20) or from 0 to 4.32, where median sequence conservation

of 4.32 indicates that all sequences in the alignment are identical to each other, and hence,

any variant in this region will be predicted as damaging. SIFT also reports the number of

sequences at the variant position. The latest version of SIFT is SIFT 4G (SIFT for genomes),

which is faster and enables practical computations on reference genomes using precom-

puted databases and also it provides SIFT prediction for more organisms. Hundreds of

databases for different organisms are available.

Use the following steps to annotate variants using SIFT 4G on Linux terminal:

First, create a directory with the name of your choice or “sift4g” and change into it.

mkdir sift4g

cd sift4g

Open “https://sift.bii.a-star.edu.sg/sift4g/public/”. You will see databases of tens of organ-

isms. Scroll down to the database of your interest, open its folder, and download the appro-

priate database build into your working directory. Since we have variants called above

from human samples, we can download the latest human build GRCh38.78 by copying the

link and using “wget” command and then unzip it using “unzip” or follow the instructions.

wget https://sift.bii.a-star.edu.sg/sift4g/public/Homo_sapiens/

GRCh38.78.zip

unzip GRCh38.78.zip

Each chromosome will have three files: a compressed file with “gz” file extension, a region

file with “.region” file extension, and a chromosome statistics file with “.txt” file extension.

Download SIFT 4G Annotator Java executable file (.jar) in a directory or in your work-

ing directory:

wget https://github.com/paulineng/SIFT4G_Annotator/raw/master/

SIFT4G_Annotator.jar